Adaptive Strategies and Regret Minimization in Arbitrarily Varying Markov Environments
نویسندگان
چکیده
منابع مشابه
Nonstochastic bandits: Countable decision set, unbounded costs and reactive environments
The nonstochastic multi-armed bandit problem, first studied by Auer, Cesa-Bianchi, Freund, and Schapire in 1995, is a game of repeatedly choosing one decision from a set of decisions (“experts”), under partial observation: In each round t , only the cost of the decision played is observable. A regret minimization algorithm plays this game while achieving sublinear regret relative to each decisi...
متن کاملAdaptive Regret Minimization in Bounded-Memory Games
Online learning algorithms that minimize regret provide strong guarantees in situations that involve repeatedly making decisions in an uncertain environment, e.g. a driver deciding what route to drive to work every day. While regret minimization has been extensively studied in repeated games, we study regret minimization for a richer class of games called bounded memory games. In each round of ...
متن کاملInference-based Decision Making in Games
Background: Reinforcement learning in complex games has traditionally been the domain of valueor policy iteration algorithms, resulting from their effectiveness in planning in Markov decision processes, before algorithms based on regret minimization guarantees such as upper confidence bounds applied to trees (UCT) and counterfactual regret minimization were developed and proved to be very succe...
متن کاملA Regret Minimization Approach in Product Portfolio Management with respect to Customers’ Price-sensitivity
In an uncertain and competitive environment, product portfolio management (PPM) becomes more challenging for manufacturers to decide what to make and establish the most beneficial product portfolio. In this paper, a novel approach in PPM is proposed in which the environment uncertainty, competitors’ behavior and customer’s satisfaction are simultaneously considered as the most important criteri...
متن کاملA Robust Adaptive Observer-Based Time Varying Fault Estimation
This paper presents a new observer design methodology for a time varying actuator fault estimation. A new linear matrix inequality (LMI) design algorithm is developed to tackle the limitations (e.g. equality constraint and robustness problems) of the well known so called fast adaptive fault estimation observer (FAFE). The FAFE is capable of estimating a wide range of time-varying actuator fault...
متن کامل